In [1]:
## IMPORTANT: On Colab, we expect your homework to be in the cs189 folder
## Please contact staff if you encounter any problems with installing dependencies
import sys, os
IS_COLAB = 'google.colab' in sys.modules
if IS_COLAB:
    from google.colab import drive
    drive.mount('/content/drive')
    %cd /content/drive/MyDrive/cs189/hw/hw1
    %pip install -r ./requirements.txt
    !pip install -U kaleido plotly
    import kaleido
    kaleido.get_chrome_sync()

import plotly.io as pio
# Import kaleido to ensure it's available and properly initialized
try:
    import kaleido
    # Initialize kaleido (this ensures it's ready to use)
    # Set renderer to use PNG output with kaleido
    pio.renderers.default = "plotly_mimetype+notebook+png"
    print("✓ Kaleido is available, using PNG renderer")
except ImportError:
    # Fallback to HTML if kaleido is not available
    pio.renderers.default = "plotly_mimetype+notebook"
    print("⚠ Kaleido not found, using HTML renderer")
except Exception as e:
    # If there's any other error, use HTML renderer
    pio.renderers.default = "plotly_mimetype+notebook"
    print(f"⚠ Error initializing kaleido: {e}, using HTML renderer")
✓ Kaleido is available, using PNG renderer
In [3]:
# Initialize Otter
import otter
grader = otter.Notebook("fashion_pt_1.ipynb")

Homework 1.1 – AGI, Everywhere, All at Once

Welcome to Homework 1.1! In this assignment, you will get familiar with common data and visualization tools like numpy, pandas, and plotly. This notebook emphasizes pandas operations throughout, and you will work with DataFrames as your primary data structure.


Due Date: Friday, September 19, 11:59 PM¶

This assignment is due on Friday, September 19, at 11:59 PM. You must submit your work to Gradescope by this deadline. Please refer to the syllabus for the Slip Day policy. No late submissions will be accepted beyond the details outlined in the Slip Day policy.

Submission Tips:¶

  • Plan ahead: We strongly encourage you to submit your work several hours before the deadline. This will give you ample time to address any submission issues.
  • Reach out for help early: If you encounter difficulties, contact course staff well before the deadline. While we are happy to assist with submission issues, we cannot guarantee responses to last-minute requests.

Assignment Overview¶

This notebook contains a series of tasks designed to help you practice and apply key concepts in data manipulation and visualization. You will complete all the TODOs in the notebook, which include both coding and written response questions. Some tasks are open-ended, which allows you to explore and experiment with different approaches.

Key Learning Objectives:¶

  1. Work with numpy and pandas for data manipulation.
  2. Visualize data using plotly and pandas' built-in plotting functions.
  3. Gain experience with organizing and analyzing datasets.
  4. Understand the importance of data exploration and preprocessing.

Grading Breakdown¶

Question Manual Grading? Points
0a No 1
1a No 1
1b No 1
1c Yes 1
1d No 1
2a No 2
2b No 1
2c Yes 1
2d Yes 2
3a No 2
3b No 2
3c No 1
3d Yes 2
3e No 2
3f No 1
3g No 1
3h Yes 1
3i No 1
3j Yes 1
4a No 1
4b No 2
4c No 2
4d No 2
4e No 2
4f No 1
4g Yes 2
4h No 1
4i No 2
4j Yes 2
Total 42

Note: "Manual" questions are written response questions that will be graded manually by the course staff. All other questions will be graded automatically by the autograder.


Instructions:¶

  1. Carefully read each question and its requirements.
  2. Complete all TODOs in the notebook. You may add extra lines of code if needed to implement your solution.
  3. For manual questions, provide clear and concise written responses.
  4. Test your code thoroughly to ensure it meets the requirements.

Good luck!

In [4]:
import numpy as np
import pandas as pd
import plotly.express as px
import torchvision
import os
import random
from IPython.display import display

IMPORTANT:¶

  • Do not change the random seed values!!!
  • Before you submit your notebook, remember to set save_models=True and load_models=True. This saves your final models which we will use for the autograder. Set these to false if you are still tweaking your model setup. We have provided code for saving models - do not change these file names!!
  • When uploading your notebook, make sure to include your model file classifier.joblib in your submission
In [5]:
# Set random seeds for reproducible results
SEED = 189
np.random.seed(SEED)
random.seed(SEED)

# IMPORTANT: set save_models to True to save trained models. YOU NEED TO DO THIS FOR THE AUTOGRADER TO WORK.
import joblib
save_models = True
load_saved_models = True # After training, you can set this to True to load the saved models and not have to re-train them.

Setup¶

Load the Fashion-MNIST dataset¶

In this homework, we will work with the Fashion-MNIST dataset, a widely used benchmark dataset for machine learning. It consists of grayscale 28x28 pixel images of various articles of clothing, making it an excellent dataset for practicing image classification.

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. Han Xiao, Kashif Rasul, Roland Vollgraf. https://github.com/zalandoresearch/fashion-mnist

This dataset serves as an alternative to the classic MNIST digits dataset, which contains images of handwritten digits. Fashion-MNIST is more challenging and better reflects real-world image classification tasks.

We will load the dataset using torchvision, a PyTorch library that provides popular datasets, models, and transformation tools. While you don't need to fully understand PyTorch for this homework, it's helpful to know that the dataset contains two key components:

  • data: the images themselves, represented as 28x28 grayscale arrays.
  • targets: the class labels for each image, where each label corresponds to a specific article of clothing.

The dataset includes 10 classes, each representing a type of clothing item:

  • T-shirt/top
  • Trouser
  • Pullover
  • Dress
  • Coat
  • Sandal
  • Shirt
  • Sneaker
  • Bag
  • Ankle boot

We will explore this dataset in detail and use it to practice data manipulation, visualization, and machine learning techniques.

In [6]:
# Load the FashionMNIST dataset from torchvision
train_data = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True)

# Extract the image data and convert it to a numpy array of type float
images = train_data.data.numpy().astype(float)

# Extract the target labels as a numpy array
targets = train_data.targets.numpy()

# Create a dictionary mapping class indices to class names
class_dict = {i: class_name for i, class_name in enumerate(train_data.classes)}

# Map the target labels to their corresponding class names
labels = np.array([class_dict[t] for t in targets])

# Create a list of class names in order of their indices
class_names = [class_dict[i] for i in range(len(class_dict))]

# Get the total number of samples in the dataset
n = len(images)

# Ensure class_names is a list of class names (redundant but ensures consistency)
class_names = list(class_dict.values())

# Print dataset information for verification
print("Loaded FashionMNIST dataset with {} samples.".format(n))
print("Classes: {}".format(class_dict))
print("Image shape: {}".format(images[0].shape))  # Shape of a single image
print("Image dtype: {}".format(images[0].dtype))  # Data type of the image array
print("Image type: {}".format(type(images[0])))   # Type of the image object
Loaded FashionMNIST dataset with 60000 samples.
Classes: {0: 'T-shirt/top', 1: 'Trouser', 2: 'Pullover', 3: 'Dress', 4: 'Coat', 5: 'Sandal', 6: 'Shirt', 7: 'Sneaker', 8: 'Bag', 9: 'Ankle boot'}
Image shape: (28, 28)
Image dtype: float64
Image type: <class 'numpy.ndarray'>

Now let's create a DataFrame to organize our data

In this class, we will be using a lot of pandas, which is a powerful library for data analysis and manipulation. A DataFrame in pandas is essentially a table where we can store and perform operations on our data.

Why use a DataFrame?¶

A DataFrame allows us to:

  • Organize data into rows and columns for better readability.
  • Perform efficient operations on the data, such as filtering, grouping, and aggregating.
  • Integrate seamlessly with other libraries for visualization and machine learning.

Problem 0a¶

Task: Create a DataFrame called df with two columns: image and label. Each row should correspond to an image and its associated label. You can preview the first 5 rows of a DataFrame by calling df.head().

Hints:

  1. What is the current object type of the variable images? Note that pandas expects 1D or 2D data for each value in a DataFrame column. You may need to first convert images to a Python list before using it to create the DataFrame.
  2. Later on, when we use our DataFrame for training, it's best if the values in the image column are ndarray objects. After creating the DataFrame, consider re-casting all the values in the image column to ndarray for consistency.
In [7]:
# TODO: Create a DataFrame with two columns: `image` and `label`

# Convert images to a list (pandas expects 1D or 2D data for each value)
images_list = [img for img in images]

# Create DataFrame with image and label columns
df = pd.DataFrame({
    'image': images_list,
    'label': labels
})

# Re-cast image column values to numpy arrays for consistency
df['image'] = df['image'].apply(lambda x: np.array(x))

# Print the shape and columns of the DataFrame
print("DataFrame shape:", df.shape)
print("DataFrame columns:", df.columns.tolist())
df.head()
DataFrame shape: (60000, 2)
DataFrame columns: ['image', 'label']
Out[7]:
image label
0 [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,... Ankle boot
1 [[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,... T-shirt/top
2 [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,... T-shirt/top
3 [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 33.0... Dress
4 [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,... T-shirt/top
In [8]:
grader.check("q0")
Out[8]:

q0
passed! 💯

Problem 1: Introduction to pandas and Plotly¶

Now that we have created our DataFrame, let's start analyzing our data. A key aspect of machine learning is understanding the data you are working with, so let's create some visualizations of our dataset.

One of the first steps in data analysis is to check how "balanced" the dataset is. This means examining the distribution of the labels to see if each class appears equally in the dataset. A balanced dataset ensures that no class is overrepresented or underrepresented, which can impact the performance of machine learning models.

Problem 1a: Checking Dataset Balance¶

Task: Calculate the distribution of the label column in the df DataFrame using value_counts() and store it in a variable called label_distribution. Then, determine whether or not our dataset is balanced by comparing the minimum and maximum values of label_distribution. Store the result as a boolean value in the is_balanced variable.

In [9]:
# TODO: Calculate the distribution of labels using `value_counts()``
# TODO: Compare the min and max values of `label_distribution` to determine if the dataset is balanced. 

# Calculate label distribution using value_counts()
label_distribution = df['label'].value_counts()

# Determine if dataset is balanced by comparing min and max values
# Dataset is balanced if min == max (all classes have the same count)
is_balanced = label_distribution.min() == label_distribution.max()

print(f"Label distribution:\n{label_distribution}")
print(f"Is the dataset balanced? {is_balanced}")
Label distribution:
label
Ankle boot     6000
T-shirt/top    6000
Dress          6000
Pullover       6000
Sneaker        6000
Sandal         6000
Trouser        6000
Shirt          6000
Coat           6000
Bag            6000
Name: count, dtype: int64
Is the dataset balanced? True
In [10]:
grader.check("q1a")
Out[10]:

q1a
passed! 🚀

Problem 1b: Grouping Data with groupby()¶

The groupby() function in pandas is a powerful tool for grouping rows based on column values and applying aggregation functions like .size().

Task: Group df by the label column and count the rows in each group using .size().

Example Output:¶

label count
Ankle boot 6000
Bag 6000
... ...

Learn more about groupby() here.

In [11]:
# TODO: Group the rows in `df` according to the values in the `labels` column. Then, count the number of rows in each group.
label_distribution_groupby = df.groupby('label').size()
In [12]:
grader.check("q1b")
Out[12]:

q1b
passed! 🌈

Problem 1c: Visualizing Label Distribution¶

One of the strengths of pandas is its ability to quickly generate visualizations of data. This is particularly useful for understanding the distribution of your dataset. In this task, we will use pandas' built-in plotting functions to create a visualization of the label distribution in our DataFrame.

Why Visualize Label Distribution?¶

Visualizing the label distribution helps us:

  • Understand the balance of classes in the dataset.
  • Identify any potential biases or imbalances that could affect model performance.
  • Gain insights into the dataset before proceeding with further analysis.

Task:

  1. Use the pandas built-in plotting functions to create a histogram of the label distribution. (x-axis is the class label and y axis is the sample count)
  2. Ensure the chart is clear and labeled appropriately for easy interpretation.
In [8]:
# Plotting library to use, default is matplotlib but plotly has more functionality
pd.options.plotting.backend = "plotly" 

# TODO: Plot a histogram of the labels in the DataFrame `df` using the DataFrame's built-in plotting functions (this should be 1 line)
fig = df['label'].value_counts().plot(kind='bar', title='Label Distribution')
fig.update_layout(xaxis_title='Class Label', yaxis_title='Sample Count')
# Display the figure (will use HTML rendering in notebook)
fig

As a quick refresher, here is the show_images function from lecture. This function visualizes our images and labels each of them with what class they are from.

In [9]:
def show_images(images, max_images=40, ncols=5, labels = None, reshape=False):
    """Visualize a subset of images from the dataset.
    Args:
        images (np.ndarray or list): Array of images to visualize [img,row,col].
        max_images (int): Maximum number of images to display.
        ncols (int): Number of columns in the grid.
        labels (np.ndarray, optional): Labels for the images, used for facet titles.
    Returns:
        plotly.graph_objects.Figure: A Plotly figure object containing the images.
    """
    if isinstance(images, list):
        images = np.stack(images)
    n = min(images.shape[0], max_images) # Number of images to show
    px_height = 220 # Height of each image in pixels
    if reshape:
        images = images.reshape(images.shape[0], 28, 28)
    fig = px.imshow(images[:n, :, :], color_continuous_scale='gray_r', 
                    facet_col = 0, facet_col_wrap=ncols,
                    height = px_height * int(np.ceil(n/ncols)))
    fig.update_layout(coloraxis_showscale=False)
    fig.update_xaxes(showticklabels=False, showgrid=False)
    fig.update_yaxes(showticklabels=False, showgrid=False)
    if labels is not None:
        # Extract the facet number and replace with the label.
        fig.for_each_annotation(lambda a: a.update(text=labels[int(a.text.split("=")[-1])]))
    return fig

Problem 1d: Visualizing Class Examples¶

To better understand the dataset, let's visualize a few examples from each class. This will help us see what the images look like and how they differ across classes.

Task:

  1. Use the pandas groupby function to group the DataFrame by the label column.
  2. Sample 2 images per class.
  3. Use the show_images function to display the images in a grid, with each image labeled by its class name.
In [10]:
# TODO: Get 2 sample images per class and plot them.
# Group by label and sample 2 images from each class
examples = df.groupby('label').apply(lambda x: x.sample(n=2, random_state=SEED)).reset_index(drop=True)

fig = show_images(examples["image"].tolist(), ncols=4, labels=examples["label"].tolist())
fig.show()
/var/folders/h_/w713ftk92p92c30xqw97tf4h0000gn/T/ipykernel_96395/3279938773.py:3: FutureWarning:

DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.

In [27]:
grader.check("q1d")
Out[27]:

q1d
passed! 🎉

Problem 2: Understanding Data Structure with Clustering¶

Before training classifiers, we explore the data structure using k-means clustering, an unsupervised learning method. This helps identify patterns and relationships in the dataset.

Why Clustering?

  • Discover Similarities: Group similar clothing items based on pixel values.
  • Data Insights: Understand dataset structure to guide modeling.
  • Simplify Data: Potential preprocessing or dimensionality reduction.

Steps:

  1. Flatten images for clustering (done below).
  2. Apply k-means to group images.
  3. Analyze clusters for patterns.

Before we can apply clustering algorithms or train models, we need to preprocess our images. Most machine learning algorithms expect input data to be in a 1-dimensional format. Currently, our images are in a 2D format with dimensions (28, 28).

Thus, let's first reshape each image from (28, 28) to a 1-dimensional array of size (784,) using the Pandas the apply() function

In [18]:
# Flatten each image from (28, 28) to (784,)
# Ensure each image is converted to numpy array and flattened to 1D
def flatten_image(img):
    img_array = np.array(img)
    # If image is 2D (28, 28), reshape to 1D (784,)
    # If already 1D, ensure it's the right shape
    if img_array.ndim == 2:
        return img_array.reshape(-1)
    elif img_array.ndim == 1:
        # Already 1D, but ensure it's exactly 784 elements
        if img_array.size == 784:
            return img_array
        else:
            return img_array.reshape(-1)
    else:
        return img_array.flatten()

df["image"] = df["image"].apply(flatten_image)

# Verify all images are flattened to shape (784,)
assert df['image'].apply(lambda img: img.shape == (784,)).all(), 'Not all images are flattened to shape (784,)'

np.stack(df['image'].values).shape
Out[18]:
(60000, 784)

Problem 2a: K-means Clustering on the Pixels¶

Use K-means clustering to group similar images based on their pixel values. This will help us understand how well the algorithm can identify patterns in the dataset without using the labels.

Task:

  1. Use the sklearn's KMeans class to cluster the images into 10 clusters (since there are 10 classes in the dataset). For efficiency we will only cluster a 1000 image sample (df_sample).
  2. Create a DataFrame called kmeans_df with the following columns:
    • image: the image data (flattened to 1D arrays of size 784).
    • label: the true class label of the image.
    • cluster: the cluster label assigned by K-means.

Instructions:

  • When clustering, set random_state=SEED for reproducibility.

Expected Output: The kmeans_df DataFrame should look like this:

cluster label image
7 Ankle boot [0.0, 0.0, 0.0, 0.0, 0.0, ...]
6 T-shirt/top [0.0, 0.0, 0.0, 0.0, 1.0, ...]
In [19]:
# TODO: Perform k-means clustering on the images (10 clusters to match the number of classes)
from sklearn.cluster import KMeans

df_sample = df.sample(n=1000, random_state=SEED)

# Flatten images to 1D arrays for clustering
# Ensure images are flattened: if image is 2D (28x28), reshape to 1D (784,)
X_sample = np.stack([img.flatten() if img.ndim > 1 else img for img in df_sample['image'].values])

# Verify shape: should be (1000, 784)
print(f"X_sample shape: {X_sample.shape}")

# Perform K-means clustering with 10 clusters
kmeans = KMeans(n_clusters=10, random_state=SEED)
cluster_labels = kmeans.fit_predict(X_sample)

# Create DataFrame with cluster assignments
kmeans_df = pd.DataFrame({
    'image': df_sample['image'].values,
    'label': df_sample['label'].values,
    'cluster': cluster_labels
})

kmeans_df.head(3)
X_sample shape: (1000, 784)
Out[19]:
image label cluster
0 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... Bag 1
1 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 54.0,... T-shirt/top 1
2 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... Trouser 2
In [20]:
grader.check("q2a")
Out[20]:

q2a
passed! 💯

Problem 2b: Evaluating K-means Clustering¶

K-means clustering groups data points into clusters based on their similarity. To evaluate how well the clustering algorithm has separated the classes, we can analyze the distribution of true labels within each cluster.

Task:

  1. Use the kmeans_df DataFrame to calculate the distribution of true labels (label) within each cluster (cluster).
  2. Create a stacked bar plot to visualize the label counts per cluster. Each bar should represent a cluster, and the segments of the bar should represent the counts of each label within that cluster.

Hint: If you are running into issues where there are bars “hidden” behind other ones in your Plotly bar chart, try making sure you use fillna(0) or unstack(fill_value=0) after grouping by your KMean clusters.

In [16]:
# TODO: Create a stacked bar plot of the label counts per cluster.
# Group by cluster and label, then count occurrences
cluster_label_counts = kmeans_df.groupby(['cluster', 'label']).size().unstack(fill_value=0)

# Create bar plot and set barmode to 'stack' for stacked bars
fig = cluster_label_counts.plot(
    kind='bar',
    title='Distribution of True Labels in Each K-means Cluster'
)
# Update layout to stack bars
fig.update_layout(barmode='stack')

Problem 2c: Visualizing Clusters¶

To better understand the clusters formed by the K-means algorithm, we will visualize a few sample images from each cluster. This will help us identify patterns or similarities among images within the same cluster.

Task:

  1. For each cluster, randomly sample 7 images.
  2. Use the show_images function to display the sampled images in a grid.
  3. Observe the visual similarities among images in the same cluster.
In [21]:
# TODO: Plot 7 images from each cluster (use the show_images function, 10 rows, 7 columns)
# Sample 7 images from each cluster
cluster_samples = kmeans_df.groupby('cluster').apply(lambda x: x.sample(n=min(7, len(x)), random_state=SEED)).reset_index(drop=True)

# Reshape images for visualization (from 784 to 28x28)
cluster_images = np.stack([img.reshape(28, 28) for img in cluster_samples['image'].values])

# Create labels showing cluster and true label
cluster_labels_display = [f"Cluster {c}, True: {l}" for c, l in zip(cluster_samples['cluster'], cluster_samples['label'])]

# Display images
show_images(cluster_images, max_images=70, ncols=7, labels=cluster_labels_display, reshape=False)
/var/folders/h_/w713ftk92p92c30xqw97tf4h0000gn/T/ipykernel_96395/1404412372.py:3: FutureWarning:

DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.

Problem 2d: Observing Patterns in K-means Clustering¶

Reflecting on the visualizations from the previous part, we observe that the k-means clustering algorithm groups images not only by their clothing category (class) but also by other shared characteristics.

Question: Besides the clothing category, what other visual or structural characteristics of the images might the k-means clustering algorithm be grouping together?

Type your answer here, replacing this text.

Problem 3: Training a Classifier¶

In this section, we will train a machine learning classifier to predict clothing categories from image pixel data. Specifically, we will use a Multi-Layer Perceptron (MLP) classifier, which is a type of neural network.

Workflow Overview¶

We will follow a structured workflow:

  1. Data Preparation: Split the dataset into training and testing sets while maintaining class balance.
  2. Model Training: Train the MLP classifier on the training set.
  3. Model Evaluation: Evaluate the classifier's performance on the test set using metrics like accuracy.
  4. Visualization: Visualize predictions and analyze misclassifications to understand model behavior.

This workflow mirrors the process used in the lecture notebook, but you will implement some of the functions yourself to deepen your understanding.

Creating Train/Test Split As mentioned in lecture, first we will split our dataset into training and testing sets. This is a crucial step in machine learning to evaluate how well a model generalizes to unseen data.

Unlike the lecture, where we used sklearn's train_test_split function, we have split our dataset using pandas functions.

Do not change this function! Otherwise the autograder will likely fail.

In [24]:
df_copy = df.copy()
train_df = df_copy.groupby('label').sample(frac=0.8, random_state=SEED)
test_df = df_copy[~df_copy.index.isin(train_df.index)]
print(f"Training set size: {len(train_df)}")
print(f"Test set size: {len(test_df)}")
Training set size: 48000
Test set size: 12000

Problem 3a: Train MLP Classifier¶

In this task, we will train a Multi-Layer Perceptron (MLP) classifier to predict clothing categories from image data. The MLP is a type of neural network that is well-suited for classification tasks. The demo notebook from lecture 3 could be particularly useful.

Steps to Follow:

  1. Data Normalization:

    • Scale the pixel values of the images to the range [0, 1] for better training performance.
    • Create new variables X_train_sc and X_test_sc for the scaled training and testing data, respectively. Do not overwrite the original X_train and X_test.
  2. Model Training:

    • Use the same MLP configuration (size, hyperparameters) as demonstrated in the lecture 3 notebook.
    • Train the model on the normalized training data.
  3. Loss Curve:

    • Extract the loss curve from the trained model using the model.loss_curve_ attribute.
    • Create a DataFrame called loss_df with two columns: epoch and loss.
    • Use Plotly Express to plot the loss curve, showing how the loss decreases as the number of epochs increases.

Notes:

  • The term "loss" refers to the error (textbook terminology) during training. Minimizing the loss is the goal of the training process.
  • Ensure that the model is trained with reproducibility in mind (e.g., set a random seed to SEED where applicable).
In [22]:
# Importing necessary modules for training and preprocessing
from sklearn.neural_network import MLPClassifier  # Multi-Layer Perceptron Classifier for training
from sklearn.preprocessing import StandardScaler  # StandardScaler for normalizing the data
In [25]:
# flatten features into 1D arrays
X_train = np.stack(train_df['image'].values)
y_train = train_df['label'].values
X_test = np.stack(test_df['image'].values)
y_test = test_df['label'].values

print(f"X_train shape: {X_train.shape}\t y_train shape: {y_train.shape}")
print(f"X_test shape: {X_test.shape}\t y_test shape: {y_test.shape}")

# TODO: Train the model using the scaled traning data and plott the loss curve (remeber to normalize your data!)
# NOTE: Your model must be named `model`

if load_saved_models and os.path.exists('classifier.joblib'):
    model = joblib.load('classifier.joblib')
    # Still need to scale the data for predictions
    scaler = StandardScaler()
    X_train_sc = scaler.fit_transform(X_train)
    X_test_sc = scaler.transform(X_test)
    # Create loss_df from saved model
    loss_df = pd.DataFrame({
        'epoch': range(1, len(model.loss_curve_) + 1),
        'loss': model.loss_curve_
    })
else:
    # Step 1: Normalize the data (scale pixel values to [0, 1])
    # Since pixel values are already in [0, 255], divide by 255 to get [0, 1]
    X_train_sc = X_train / 255.0
    X_test_sc = X_test / 255.0
    
    # Step 2: Train MLP Classifier
    # Using similar configuration as lecture 3 notebook
    model = MLPClassifier(
        hidden_layer_sizes=(100, 50),  # Two hidden layers with 100 and 50 neurons
        max_iter=100,  # Maximum number of iterations
        random_state=SEED,  # For reproducibility
        verbose=False
    )
    
    # Train the model
    model.fit(X_train_sc, y_train)
    
    # Step 3: Create loss curve DataFrame
    loss_df = pd.DataFrame({
        'epoch': range(1, len(model.loss_curve_) + 1),
        'loss': model.loss_curve_
    })

if save_models:
    joblib.dump(model, 'classifier.joblib')

loss_df.plot(x='epoch', y='loss', title="Training Error")
X_train shape: (48000, 784)	 y_train shape: (48000,)
X_test shape: (12000, 784)	 y_test shape: (12000,)
/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/site-packages/sklearn/neural_network/_multilayer_perceptron.py:788: UserWarning:

Training interrupted by user.

In [26]:
grader.check("q3a")
Out[26]:

q3a
passed! 🚀

Problem 3b: Adding Predictions and Evaluation Metrics to DataFrames¶

Task: Modify both train_df and test_df by adding the following columns and compute train and test accuracy:

  1. predicted_label: The predicted label for each image, as determined by the trained model.
  2. correct: A boolean value indicating whether the predicted label matches the true label (True for correct predictions, False otherwise).
  3. probs: The class probabilities for each image, represented as a list of size 10 (one probability per class).
  4. confidence: The probability associated with the predicted label, representing the model's confidence in its prediction.
In [27]:
# TODO: Add the columns listed above to `train_df` and `test_df`.
train_df = train_df.copy()
test_df = test_df.copy()

# Get predictions for training and test sets
train_predicted_labels = model.predict(X_train_sc)
test_predicted_labels = model.predict(X_test_sc)

# Get class probabilities for training and test sets
train_probs = model.predict_proba(X_train_sc)
test_probs = model.predict_proba(X_test_sc)

# Add predicted_label column
train_df['predicted_label'] = train_predicted_labels
test_df['predicted_label'] = test_predicted_labels

# Add correct column (boolean indicating if prediction matches true label)
train_df['correct'] = train_df['predicted_label'] == train_df['label']
test_df['correct'] = test_df['predicted_label'] == test_df['label']

# Add probs column (list of probabilities for each class)
train_df['probs'] = train_probs.tolist()
test_df['probs'] = test_probs.tolist()

# Add confidence column (probability of the predicted label)
train_df['confidence'] = [probs[np.argmax(probs)] for probs in train_probs]
test_df['confidence'] = [probs[np.argmax(probs)] for probs in test_probs]

print("--- Column Types ----")
for col in train_df.columns:
    val = train_df[col].iloc[0]
    print(f"{col}: {type(val)}")
print("-----------")

# Calculate accuracy
train_accuracy = train_df['correct'].mean()
test_accuracy = test_df['correct'].mean()

print(f"Training accuracy: {train_accuracy:.3f}")
print(f"Test accuracy: {test_accuracy:.3f}")
--- Column Types ----
image: <class 'numpy.ndarray'>
label: <class 'str'>
predicted_label: <class 'str'>
correct: <class 'numpy.bool'>
probs: <class 'list'>
confidence: <class 'numpy.float64'>
-----------
Training accuracy: 0.975
Test accuracy: 0.884
In [28]:
grader.check("q3b")
Out[28]:

q3b
passed! 🌈

Problem 3c: Class Accuracy Analysis and Visualization¶

Analyze the model's performance for each class and visualize the class-wise accuracy for both the training and testing datasets.

Task 1: Create a class_accuracy DataFrame¶

  1. Group the train_df and test_df DataFrames by label (class).
  2. Calculate the accuracy for each class as the proportion of correct predictions (correct column).
  3. Add a split column to indicate whether the data is from the training or testing set.
  4. Combine the results into a single DataFrame called class_accuracy with the following columns:
    • split: Indicates whether the data is from the training or testing set.
    • label: The class label.
    • correct: The accuracy for the class.

Task 2: Visualize Class Accuracy¶

  1. Use the class_accuracy DataFrame to create a grouped bar chart.
  2. The x-axis should represent the class labels (label), and the y-axis should represent the accuracy (correct).
  3. Use different colors for the training and testing splits:
    • Training: Blue
    • Testing: Red
  4. Add the actual accuracy values on top of the bars, rounded to two decimal places. To do this you can add text_auto=True to your .plot call. If you want to round these numbers to the nearest 2nd decimal, set text_auto='.2f'

Hints:

  • Use reset_index() after grouping to convert the grouped data into a DataFrame.

For example, after a groupby:

df.groupby(['A', 'B'])['C'].mean()

you get a Series with a multi-index:

    A      B       
    foo    x      0.92
           y      0.85
    bar    x      0.99
           y      0.97
    Name: C, dtype: float64

If you call .reset_index(), you get a DataFrame with columns:

       A    B     C
    0  foo  x  0.92
    1  foo  y  0.85
    2  bar  x  0.99
    3  bar  y  0.97

This makes it much easier to plot or further manipulate the data.

In [29]:
# TODO: Calculate train and test accuracy per class 
# TODO: Use class_accuracy to create a grouped bar chart of class accuracy for train and test

# Task 1: Calculate accuracy per class for training set
train_class_accuracy = train_df.groupby('label')['correct'].mean().reset_index()
train_class_accuracy['split'] = 'train'

# Calculate accuracy per class for test set
test_class_accuracy = test_df.groupby('label')['correct'].mean().reset_index()
test_class_accuracy['split'] = 'test'

# Combine into single DataFrame
class_accuracy = pd.concat([train_class_accuracy, test_class_accuracy], ignore_index=True)

# Task 2: Create grouped bar chart
# Use plotly for plotting with different colors for train and test
fig = px.bar(
    class_accuracy,
    x='label',
    y='correct',
    color='split',
    barmode='group',
    title='Class Accuracy for Training and Testing Sets',
    labels={'label': 'Class Label', 'correct': 'Accuracy'},
    color_discrete_map={'train': 'blue', 'test': 'red'},
    text_auto='.2f'
)
fig.update_traces(textposition='outside')
fig.show()

print(class_accuracy)
          label   correct  split
0    Ankle boot  0.999375  train
1           Bag  0.998333  train
2          Coat  0.921875  train
3         Dress  0.987500  train
4      Pullover  0.903750  train
5        Sandal  1.000000  train
6         Shirt  0.953542  train
7       Sneaker  0.996042  train
8   T-shirt/top  0.986458  train
9       Trouser  0.999792  train
10   Ankle boot  0.971667   test
11          Bag  0.955000   test
12         Coat  0.800000   test
13        Dress  0.913333   test
14     Pullover  0.754167   test
15       Sandal  0.957500   test
16        Shirt  0.730833   test
17      Sneaker  0.922500   test
18  T-shirt/top  0.858333   test
19      Trouser  0.979167   test
In [30]:
grader.check("q3c")
Out[30]:

q3c
passed! 🎉

Problem 3d: Best and Worst Performing Classes¶

Question:

  • Identify the best and worst performing classes for train and test splits. If tied, list all classes with the same performance.
  • Do the best/worst performing classes match between splits?
  • Do train and test accuracies differ? Why?

Type your answer here, replacing this text.

Problem 3e: Create Confusion Matrix¶

An often easier way to understand model performance is with a confuction matrix, which show how often predictions match the true labels and where errors occur.


Refresher:¶

  1. Precision: Measures the accuracy of positive predictions for a class. $$ \text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}} $$

  2. Recall: Measures the ability to identify all positive samples for a class. $$ \text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}} $$


Tasks:

  1. Hand-implement a confusion matrix:
  • Use numpy operations to compute a 10x10 matrix where rows represent true labels and columns represent predicted labels.
  1. Visualize the confusion matrix:
  • Use a heatmap to display the matrix for better interpretability. Y axis should be the true label and the X axis should be the predicted label.
  1. Using your confusion matrix, evaluate performance:
  • Compute overall test accuracy.
  • Calculate precision and recall for each class using the confusion matrix.
In [53]:
# Initialize confusion matrix with zeros
conf_matrix = np.zeros((len(class_names), len(class_names)), dtype=int)
class_to_idx = {class_name: idx for idx, class_name in enumerate(class_names)}

# Fill the confusion matrix by counting predictions and plot it as a heatmap
from sklearn.metrics import confusion_matrix

# Get true labels and predicted labels for test set
y_true = test_df['label'].values
y_pred = test_df['predicted_label'].values

# Create confusion matrix using sklearn
conf_matrix = confusion_matrix(y_true, y_pred, labels=class_names)

# Plot confusion matrix as heatmap using plotly
import plotly.graph_objects as go

fig = go.Figure(data=go.Heatmap(
    z=conf_matrix,
    x=class_names,
    y=class_names,
    colorscale='Viridis',
    text=conf_matrix,
    texttemplate='%{text}',
    textfont={"size": 10},
    colorbar=dict(title="Count")
))

fig.update_layout(
    title='Confusion Matrix',
    xaxis_title='Predicted Label',
    yaxis_title='True Label',
    width=800,
    height=700
)

fig.show()
In [ ]:
# Calculate accuracy from confusion matrix
# Accuracy = sum of diagonal (correct predictions) / total predictions
accuracy_from_matrix = np.trace(conf_matrix) / np.sum(conf_matrix)
print(f"\nAccuracy calculated from confusion matrix: {accuracy_from_matrix:.3f}")

# Calculate per-class metrics from confusion matrix
per_class_metrics = []
print("\nPer-class metrics from confusion matrix:")
for i, class_name in enumerate(class_names):
    # True Positives: diagonal element (correct predictions for this class)
    true_positives = conf_matrix[i, i]
    
    # False Positives: sum of column i (excluding diagonal) - predicted as this class but actually other classes
    false_positives = np.sum(conf_matrix[:, i]) - true_positives
    
    # False Negatives: sum of row i (excluding diagonal) - actually this class but predicted as other classes
    false_negatives = np.sum(conf_matrix[i, :]) - true_positives
    
    # Calculate precision and recall
    precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0
    recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0
    
    per_class_metrics.append({
        'class': class_name,
        'precision': precision,
        'recall': recall
    })
    
pd.DataFrame(per_class_metrics)
Accuracy calculated from confusion matrix: 0.884

Per-class metrics from confusion matrix:
Out[ ]:
class precision recall
0 T-shirt/top 0.803432 0.858333
1 Trouser 0.976725 0.979167
2 Pullover 0.841860 0.754167
3 Dress 0.895425 0.913333
4 Coat 0.817021 0.800000
5 Sandal 0.957500 0.957500
6 Shirt 0.696585 0.730833
7 Sneaker 0.955959 0.922500
8 Bag 0.976150 0.955000
9 Ankle boot 0.932800 0.971667
In [55]:
grader.check("q3e")
Out[55]:

q3e
passed! 🍀

Problem 3f: Analyze Prediction Confidence¶

In this section, we will analyze the model's prediction confidence to better understand its behavior. Specifically, we will identify examples where the model is uncertain or overly confident, and evaluate how these cases relate to the correctness of its predictions.

Objectives:¶

  1. Find the Image with the Lowest Confidence:

    • Identify the image for which the model has the least confidence in its prediction.
  2. Analyze Low Confidence but Correct Predictions:

    • Find examples where the model made the correct prediction but with low confidence.
  3. Analyze High Confidence but Incorrect Predictions:

    • Identify examples where the model is highly confident but makes incorrect predictions.

Task: Let’s start by finding the image with the lowest confidence.

In [56]:
# TODO: Find the image with the lowest confidence by sorting the `confidence` column of `test_df`
# Sort all rows by confidence in ascending order (lowest confidence first)
least_confident = test_df.sort_values('confidence', ascending=True).reset_index(drop=True)
print("Image with lowest confidence:")
print(least_confident[['label', 'predicted_label', 'confidence', 'correct']][:3])

# Show image with lowest confidence and its predicted label (show first 8 for visualization)
show_labels = [f"{label} (Pred: {predicted_label})" for label, predicted_label in zip(least_confident["label"].tolist()[:8], least_confident["predicted_label"].tolist()[:8])]
fig = show_images(np.stack(least_confident["image"].tolist()[:8]), 8, ncols=4, labels=show_labels, reshape=True)
fig.show()
Image with lowest confidence:
         label predicted_label  confidence  correct
0  T-shirt/top            Coat    0.327358    False
1        Shirt     T-shirt/top    0.330295    False
2         Coat            Coat    0.343291     True
In [57]:
grader.check("q3f")
Out[57]:

q3f
passed! 🍀

Problem 3g: Investigating Class Confusion for "Ankle boot"¶

Task: Visualize Low-Confidence Correct Predictions: Display 10 test images where the true label is "Ankle boot," the prediction is correct, but confidence is lowest.

In [59]:
# TODO: Visualize 10 images from the `test_set` whose true label is `Ankle boot` that the model correctly classified but with low confidence
# Filter for rows where both true label and predicted label are 'Ankle boot'
test_df_boot = test_df[(test_df['label'] == 'Ankle boot') & (test_df['predicted_label'] == 'Ankle boot')].copy()

# Find low confidence correct predictions (uncertain but right)
# Since test_df_boot already contains only correct predictions, we just need to sort by confidence
low_conf_correct = test_df_boot.nsmallest(10, 'confidence')

# Visualize low confidence correct predictions
if len(low_conf_correct) > 0:
    show_labels = [f"True: {label} (Pred: {pred_label}, Conf: {conf:.3f})" 
                   for label, pred_label, conf in zip(
                       low_conf_correct["label"].tolist(), 
                       low_conf_correct["predicted_label"].tolist(),
                       low_conf_correct["confidence"].tolist()
                   )]
    fig = show_images(np.stack(low_conf_correct["image"].tolist()), max_images=10, ncols=5, labels=show_labels, reshape=True)
    fig.show()
else:
    print("No low confidence correct predictions found for Ankle boot")
In [60]:
grader.check("q3g")
Out[60]:

q3g
passed! 🍀

Problem 3h: Reasons for Low Confidence in the "Ankle boot" Class¶

Task: Analyze visual patterns in low-confidence images for the "Ankle boot" class. What is a potential reasons for the model to be so unconfident in these classifications?

Answer:

After visualizing the low-confidence correct predictions for "Ankle boot" class, I observe the following visual patterns:

Potential reasons for low confidence:

  1. Visual ambiguity: Some ankle boot images may have features that resemble other footwear classes (like sneakers or sandals), making the model uncertain even when it makes the correct prediction.

  2. Unusual angles or orientations: Images taken from non-standard angles may lack clear distinguishing features that the model relies on for confident classification.

  3. Partial occlusion or unusual backgrounds: Some images may have parts of the boot obscured or confusing backgrounds that reduce model confidence.

  4. Similarity to training data: If the low-confidence images differ significantly from the typical ankle boot images in the training set, the model may be less confident even when correct.

  5. Class overlap: Ankle boots share visual characteristics with other footwear classes (sneakers, sandals), leading to lower confidence scores as the model considers multiple possibilities.

Note: Analyze the actual images displayed in Problem 3g to provide specific observations about the visual patterns you see.

Problem 3i: Investigating Class Confusion for "Trouser"¶

Now let's look at cases where the model is confidently incorrect.

Task: For the Trouser class, visualize the 10 images from the test set which are incorrectly classified as Dress but have the highest confience and answer the question below.

In [58]:
# TODO: Visualize 10 images from the `test_set` whose true label is `Trouser` that the model incorrectly classified as `Dress` with high confidence
test_df_trouser = test_df[test_df['label'] == 'Trouser'].copy()

# Find high confidence incorrect predictions (confident but wrong)
# Filter for incorrect predictions where predicted label is 'Dress'
high_conf_incorrect = test_df_trouser[
    (test_df_trouser['correct'] == False) & 
    (test_df_trouser['predicted_label'] == 'Dress')
].nlargest(10, 'confidence')

# Visualize high confidence incorrect predictions
if len(high_conf_incorrect) > 0:
    show_labels = [f"True: {label} (Pred: {pred_label}, Conf: {conf:.3f})" 
                   for label, pred_label, conf in zip(
                       high_conf_incorrect["label"].tolist(), 
                       high_conf_incorrect["predicted_label"].tolist(),
                       high_conf_incorrect["confidence"].tolist()
                   )]
    fig = show_images(np.stack(high_conf_incorrect["image"].tolist()), max_images=10, ncols=5, labels=show_labels, reshape=True)
    fig.show()
else:
    print("No high confidence incorrect predictions found for Trouser -> Dress")
In [49]:
grader.check("q3i")
Out[49]:

q3i
passed! ✨

Problem 3j: Reasons for High Confidence in the "Trouser" Class¶

Task: What are some potential reasons for the model to be so confident in its classifications of some of these examples?

Answer:

After visualizing the high-confidence incorrect predictions where "Trouser" is misclassified as "Dress", I observe the following:

Potential reasons for high confidence in incorrect predictions:

  1. Visual similarity: Some trouser images may have features that strongly resemble dresses (e.g., wide-leg trousers, flowing fabrics, similar silhouettes), leading the model to confidently but incorrectly classify them.

  2. Shared features: Both trousers and dresses can have:

    • Similar fabric patterns or textures
    • Overlapping color schemes
    • Similar overall shapes when viewed from certain angles
  3. Training data bias: If the training set has more examples of dresses with trouser-like features, the model may learn to associate those features with dresses, leading to confident misclassification.

  4. Feature extraction limitations: The MLP may be focusing on certain pixel patterns that are common to both classes, rather than learning the distinguishing features that separate trousers from dresses.

  5. Model overconfidence: The model may have learned patterns that work well for most cases but fail on edge cases, yet still assign high confidence due to the strength of those learned patterns.

Note: Analyze the actual images displayed in Problem 3i to provide specific observations about why the model might be confident in these misclassifications.

Now that we have become more familiar with the modeling process, let’s look at how we can augment our data and how these augmentations affect our classifier.

Problem 4: Image Augmentation via Transformation Matrices¶

In this problem, you will explore how to implement image augmentations such as rotation, flipping, and scaling—using matrix multiplication. The goal is to construct a transformation matrix $T$ such that, when multiplied by a flattened image vector, it produces the augmented image:

$$\text{augmented\_image} = T \cdot \text{original\_image} = \text{original\_image} \cdot T^T$$

Each transformation matrix $T$ will be of size $N \times N$, where $N$ is the total number of pixels in the image (e.g., for a 28×28 image, $N=784$). Each row of $T$ defines how to compute the value of a single output pixel as a weighted sum of the input pixels.


Why Use a Transformation Matrix?¶

Using a matrix for image transformations has several advantages:

  1. Efficiency: Matrix multiplication is computationally efficient and can be optimized for hardware acceleration.
  2. Composability: Multiple transformations (e.g., rotation followed by scaling) can be combined into a single matrix by multiplying their respective transformation matrices.
  3. Flexibility: Any linear transformation, including interpolation, can be represented as a matrix.

Example: Horizontal Flip Matrix¶

Let’s consider a simple example of flipping a 3×3 image horizontally. The flattened image is ordered row-wise:

Original indices: $$\begin{bmatrix} 0 & 1 & 2 \\ 3 & 4 & 5 \\ 6 & 7 & 8 \end{bmatrix}$$

After a horizontal flip, the columns are reversed: $$\begin{bmatrix} 2 & 1 & 0 \\ 5 & 4 & 3 \\ 8 & 7 & 6 \end{bmatrix}$$

The transformation matrix $T$ for this operation is a permutation matrix that swaps the columns for each row. For a 3×3 image, $T$ is a 9×9 matrix where each row has a single 1 in the position corresponding to the flipped pixel, and 0 elsewhere.


In this question, you will:

  1. Understand Transformation Matrices:

    • Learn how to construct transformation matrices for common operations like shifting, blurring, and rotating.
  2. Implement Augmentations:

    • Write code to generate transformation matrices for the following operations:
      • Shifting: Move the image left, right, up, or down.
      • Blurring: Apply a smoothing effect by averaging neighboring pixels.
      • Rotating: Rotate the image by a specified angle.
  3. Combine Transformations:

    • Experiment with combining multiple transformations into a single matrix and observe the results.

Each method will consist of two steps:

  1. Create the Transformation Matrix:
    Construct a 784x784 transformation matrix that represents the desired image augmentation (e.g., rotation, flipping, scaling). Each row of the matrix determines how the value of a single output pixel is computed as a weighted sum of the input pixels.

  2. Apply the Transformation:
    Use the apply_transformation function (provided below) to apply the transformation matrix to your image. This function will handle the matrix multiplication and reshape the output back into the original image dimensions.

Example: Vertical Flip To help you get started, we have implemented a simple vertical flip as an example. This transformation matrix swaps the rows of the image, flipping it vertically.

In [61]:
def apply_transformation(image, T):
    # Input: A (N, 784) image vector and a (784, 784) transformation matrix
    # Output: A (N, 784) image vector
    transformed_flat = image @ T.T
    return transformed_flat.reshape(image.shape)
In [62]:
def create_vertical_flip_matrix(height=28, width=28):
    """
    Returns a (height*width, height*width) matrix that vertically flips an image
    when applied to its flattened vector. Values are 0 or 1.
    """
    N = height * width  # Total number of pixels in the image
    T = np.zeros((N, N), dtype=int)  # Initialize the transformation matrix with zeros
    for i in range(height):  # Loop over each row
        for j in range(width):  # Loop over each column
            orig_idx = i * width + j  # Compute the flattened index for the original pixel
            flipped_i = height - 1 - i  # Compute the row index after vertical flip
            flipped_idx = flipped_i * width + j  # Compute the flattened index for the flipped pixel
            # Set the corresponding entry in the transformation matrix to 1
            # This means the pixel at (i, j) moves to (flipped_i, j)
            T[flipped_idx, orig_idx] = 1
    return T

def vertical_flip(image):
    T_flip = create_vertical_flip_matrix()
    return apply_transformation(image, T_flip)
In [63]:
test_image = np.load("test_image.npy")

flipped_image = vertical_flip(test_image)
show_images(np.stack([test_image, flipped_image]), labels=['Original', 'Flipped'], reshape=True)

Problem 4a: Horizontal Flip¶

Now, let's implement a horizontal flip transformation using a matrix. A horizontal flip mirrors the image along its vertical axis. For example, the leftmost column becomes the rightmost column.

Steps:

  1. Understand the Transformation Matrix:

    • The matrix T is N x N (where N = height * width).
    • Each row of T has a single 1 to indicate the new position of a pixel after the flip.
  2. Construct the Matrix:

    • For each pixel (i, j), compute its new position (i, width - 1 - j).
  3. Apply the Transformation:

    • Use the apply_transformation function to apply T to the flattened image.

Hints:

  • Adjust the flipped_j and flipped_idx variables for the horizontal flip.
  • Ensure the function returns a flattened image after applying the transformation.
  • Fill any empty spaces in the transformed image with 0
In [64]:
def create_horizontal_flip_matrix(height=28, width=28):
    """
    Returns a (height*width, height*width) matrix that horizontally flips an image
    when applied to its flattened vector. Values are 0 or 1.
    """
    N = height * width
    T = np.zeros((N, N), dtype=int)
    for i in range(height):
        for j in range(width):
            orig_idx = i * width + j
            # Horizontal flip: column j becomes width - 1 - j
            flipped_j = width - 1 - j
            flipped_idx = i * width + flipped_j
            T[flipped_idx, orig_idx] = 1
    return T

def horizontal_flip(image):
    T_flip = create_horizontal_flip_matrix()
    return apply_transformation(image, T_flip)

flipped_image = horizontal_flip(test_image)

show_images(np.stack([test_image, flipped_image]), labels=['Original', 'Horizontal Flipped'], reshape=True)
In [65]:
grader.check("q4a")
Out[65]:

q4a
passed! 🌈

Problem 4b: Image Shifting¶

Task: Implement a function to shift images by a specified number of pixels in any direction.

Steps:

  • Create a function that shifts an image by dx pixels horizontally and dy pixels vertically.
  • Fill empty spaces with 0s.
  • Handle cases where the shift moves parts of the image outside the boundaries.
  • Return the shifted image as a flattened array.

Hint:
Think of copying pixels from a source region in the original image to a destination region in the final image. For example:

  • If dx is positive (shift right), the source x-range starts at 0 and ends at 28 - dx.
  • If dx is negative (shift left), the source x-range starts at -dx and ends at 28.
  • If dy is positive (shift up), the source y-range starts at 0 and ends at 28 - dy.
  • If dy is negative (shift down), the source y-range starts at -dy and ends at 28.

Ensure the function returns a flattened image.

Fill any empty spaces in the transformed image with 0

In [ ]:
def create_shift_matrix(dx, dy, height=28, width=28):
    """
    Create a transformation matrix for shifting an image by dx pixels horizontally and dy pixels vertically.

    Args:
        dx (int): Number of pixels to shift horizontally.
        dy (int): Number of pixels to shift vertically.
        height (int): Height of the image.
        width (int): Width of the image.

    Returns:
        np.ndarray: A (height*width, height*width) transformation matrix for shifting.
    """
    N = height * width
    T = np.zeros((N, N))
    
    # For each pixel in the output image, find which pixel from the input image it comes from
    for i in range(height):
        for j in range(width):
            # Destination position (where we're writing to)
            dest_i = i
            dest_j = j
            dest_idx = dest_i * width + dest_j
            
            # Source position (where we're reading from)
            # Shift: move dx pixels horizontally, dy pixels vertically
            src_i = i - dy  # If dy > 0 (shift up), we read from lower rows
            src_j = j - dx  # If dx > 0 (shift right), we read from left columns
            
            # Check if source is within bounds
            if 0 <= src_i < height and 0 <= src_j < width:
                src_idx = src_i * width + src_j
                T[dest_idx, src_idx] = 1
    
    return T


def shift_image(image, dx, dy):
    """
    Shift an image by dx pixels horizontally and dy pixels vertically.

    Args:
        image (np.ndarray): Flattened image array of shape (height*width,).
        dx (int): Number of pixels to shift horizontally.
        dy (int): Number of pixels to shift vertically.

    Returns:
        np.ndarray: Shifted image as a flattened array.
    """
    T = create_shift_matrix(dx, dy)
    return apply_transformation(image, T)

shifted_right_image = shift_image(test_image, 5, 0)
shifted_left_image = shift_image(test_image, -5, 0)
shifted_up_image = shift_image(test_image, 0, -5)
shifted_down_image = shift_image(test_image, 0, 5)

all_images = np.stack([test_image, shifted_up_image, shifted_down_image, shifted_right_image, shifted_left_image])
plot_labels = ['Original', 'Shifted Up', 'Shifted Down', 'Shifted Right', 'Shifted Left']
show_images(all_images, labels=plot_labels, reshape=True)
In [69]:
grader.check("q4b")
Out[69]:

q4b
passed! 🙌

Problem 4c: Image Blurring¶

Task
Implement a blurring function using a transformation matrix that averages the values of neighboring pixels.


What is blurring?
Blurring reduces the sharpness of an image by averaging each pixel with its neighbors, creating a smoother appearance.
This is done with a sliding square kernel (window) that moves across the image.
For each pixel, the kernel specifies which surrounding pixels contribute to the average.


Key Concepts

  • Kernel Size
    Controls how many neighbors are included in the average.
    • A 3×3 kernel averages a pixel with its 8 immediate neighbors.
    • A 5×5 kernel averages a pixel with its 24 neighbors.
  • Blurring Process
    1. For each pixel, place a square kernel centered on that pixel.
    2. Collect all pixels that fall inside the kernel and inside the image.
    3. Compute the average of these valid pixels and assign it to the center pixel.

Edge handling:
If the kernel extends beyond the image border, only the pixels that actually overlap the image are averaged.


Example with a 4×4 image using a 3×3 kernel

Original 4×4 image:

\begin{bmatrix} 10 & 20 & 30 & 40 \\ 15 & 25 & 35 & 45 \\ 50 & 60 & 70 & 80 \\ 55 & 65 & 75 & 85 \end{bmatrix}

Consider the pixel with value 25 in row 2 col 2 [index (1, 1) in the matrix].
Its 3×3 window contains:

\begin{bmatrix} 10 & 20 & 30 \\ 15 & \textbf{25} & 35 \\ 50 & 60 & 70 \end{bmatrix}

The blurred value for this position is the average of the numbers in this window (35).

For a corner pixel like (0, 0), the 3×3 window lies partly outside the image, so we average only the valid four neighbors: [ \frac{10 + 20 + 15 + 25}{4} = 17.5. ]

Applying this process to every pixel produces a softened 4×4 image.


Steps:

  1. Implement a function that, for each pixel, averages over a centered square window (kernel) of odd size (e.g., 3, 5, 7).
    Handle edges by averaging only the valid neighbors.
  2. Use a transformation matrix to apply this operation to the entire image.
  3. Ensure the function works for any odd kernel size (e.g., 3x3, 5x5).
  4. Return the blurred image as a flattened array.

Fill any empty spaces in the transformed image with 0

In [70]:
def create_blur_matrix(kernel_size=3, height=28, width=28):
    """
    Create a transformation matrix T that applies a uniform mean blur using a centered, odd-sized square sliding window.

    For each output pixel (i, j):
      1) Place a `kernel_size × kernel_size` window centered at (i, j).
      2) If the window is outside the image, then it will have fewer neighbors (only average the pixels that exist)

    Args:
        kernel_size (int): Size of the square kernel (must be odd).
        height (int): Height of the image.
        width (int): Width of the image.

    Returns:
        np.ndarray: A (height*width, height*width) transformation matrix for blurring.
    """
    N = height * width
    T = np.zeros((N, N))
    pad = kernel_size // 2
    
    # For each output pixel (i, j), compute the average of its kernel_size x kernel_size neighborhood
    for i in range(height):
        for j in range(width):
            output_idx = i * width + j
            
            # Find all pixels in the kernel centered at (i, j)
            valid_pixels = []
            for di in range(-pad, pad + 1):
                for dj in range(-pad, pad + 1):
                    ni = i + di  # neighbor row
                    nj = j + dj  # neighbor column
                    
                    # Check if neighbor is within image bounds
                    if 0 <= ni < height and 0 <= nj < width:
                        input_idx = ni * width + nj
                        valid_pixels.append(input_idx)
            
            # Each valid pixel contributes equally (uniform mean blur)
            if len(valid_pixels) > 0:
                weight = 1.0 / len(valid_pixels)
                for input_idx in valid_pixels:
                    T[output_idx, input_idx] = weight
    
    return T

def blur_image(image, kernel_size=3):
    """
    Apply a blur transformation to a flattened image array or a batch of flattened images.

    Args:
        image (np.ndarray): Flattened image array of shape (height*width,) or batch of images (N, height*width).
        kernel_size (int): Size of the square kernel to use for blurring.

    Returns:
        np.ndarray: Blurred image(s) as a flattened array or batch of arrays.
    """
    T = create_blur_matrix(kernel_size)
    return apply_transformation(image, T)

blurred_1x1 = blur_image(test_image, kernel_size=1)
blurred_3x3 = blur_image(test_image, kernel_size=3)
blurred_5x5 = blur_image(test_image, kernel_size=5)

blurred_images = [test_image, blurred_1x1, blurred_3x3, blurred_5x5]
blurred_labels = ['Original', 'Blur 1x1', 'Blur 3x3', 'Blur 5x5']

show_images(blurred_images, labels=blurred_labels, reshape=True)
In [71]:
grader.check("q4c")
Out[71]:

q4c
passed! 🚀

Problem 4d: Image Rotation¶

Task: Implement a function to rotate an image by a given angle theta (in degrees).

Steps:

  1. Create the Rotation Matrix:

    • Write a function create_rotation_matrix(theta) that generates a transformation matrix to rotate a flattened image by theta degrees.
    • Convert theta from degrees to radians using np.deg2rad(theta) before applying trigonometric functions.
    • Ensure the center of rotation is the center of the image.
  2. Apply the Transformation:

    • The output should be a transformation matrix of shape (height*width, height*width).
    • When this matrix is multiplied by the flattened image, it should produce the rotated image (also flattened).

Hint: Use trigonometric functions (sin, cos) to calculate the new positions of pixels after rotation.

In [160]:
def create_rotation_matrix(theta, height=28, width=28):
    """
    Create a transformation matrix for rotating an image by theta degrees.

    Args:
        theta (float): Angle of rotation in degrees.
        height (int): Height of the image.
        width (int): Width of the image.

    Returns:
        np.ndarray: A (height*width, height*width) transformation matrix for rotating.
    """
    theta = np.deg2rad(theta)
    N = height * width
    T = np.zeros((N, N))

    ci = (height - 1) / 2.0
    cj = (width - 1) / 2.0

    cos_t = np.cos(theta)
    sin_t = np.sin(theta)

    for i in range(height):
        for j in range(width):
            out_idx = i * width + j

            # image coordinate system (y axis points down)
            x = j - cj
            y = ci - i

            # inverse mapping (rotate backwards by -theta to find source)
            src_x = cos_t * x + sin_t * y
            src_y = -sin_t * x + cos_t * y

            src_j = src_x + cj
            src_i = ci - src_y

            src_i = int(np.round(src_i))
            src_j = int(np.round(src_j))

            if 0 <= src_i < height and 0 <= src_j < width:
                in_idx = src_i * width + src_j
                T[out_idx, in_idx] = 1

    return T
   


def rotate_image(image, theta):
    """
    Apply a rotation transformation to a flattened image array or a batch of flattened images.

    Args:
        image (np.ndarray): Flattened image array of shape (height*width,) or batch of images (N, height*width).
        theta (float): Angle of rotation in degrees.

    Returns:
        np.ndarray: Rotated image(s) as a flattened array or batch of arrays.
    """
    # T = create_rotation_matrix(theta)
    # return apply_transformation(image, T)
    T = create_rotation_matrix(theta)
    return apply_transformation(image, T)

# rotate with matrix
rotated_45 = rotate_image(test_image, 45) 
rotated_90 = rotate_image(test_image, 90)
rotated_200 = rotate_image(test_image, 200)
rotated_270 = rotate_image(test_image, 270)

# visualize original and 4 augmentations in plotly image grid
all_images = np.stack([test_image, rotated_45, rotated_90, rotated_200, rotated_270])
plot_labels = ['Original', 'Rotated (45°)', 'Rotated (90°)', 'Rotated (200°)', 'Rotated (270°)']
show_images(all_images, labels=plot_labels, reshape=True)
In [161]:
grader.check("q4d")
Out[161]:

q4d
results:

q4d - 1
result:

    ❌ Test case failed
    Trying:
        assert create_rotation_matrix(15).shape == (784, 784), 'Rotation matrix should be 784x784'
    Expecting nothing
    ok
    Trying:
        gt_rotate_45_transform = np.load('public_solutions/rotate_45_transform.npy')
    Expecting nothing
    ok
    Trying:
        gt_rotate_90_transform = np.load('public_solutions/rotate_90_transform.npy')
    Expecting nothing
    ok
    Trying:
        gt_rotate_200_transform = np.load('public_solutions/rotate_200_transform.npy')
    Expecting nothing
    ok
    Trying:
        gt_rotate_270_transform = np.load('public_solutions/rotate_270_transform.npy')
    Expecting nothing
    ok
    Trying:
        gt_rotate_45_transform_updated = np.load('public_solutions/rotate_45_transform_updated.npy')
    Expecting nothing
    ok
    Trying:
        gt_rotate_90_transform_updated = np.load('public_solutions/rotate_90_transform_updated.npy')
    Expecting nothing
    ok
    Trying:
        gt_rotate_200_transform_updated = np.load('public_solutions/rotate_200_transform_updated.npy')
    Expecting nothing
    ok
    Trying:
        gt_rotate_270_transform_updated = np.load('public_solutions/rotate_270_transform_updated.npy')
    Expecting nothing
    ok
    Trying:
        assert np.array_equal(rotate_image(test_image, 45), gt_rotate_45_transform) or np.array_equal(rotate_image(test_image, 45), gt_rotate_45_transform_updated), 'Rotate 45 image does not match solution'
    Expecting nothing
    **********************************************************************
    Line 10, in q4d 0
    Failed example:
        assert np.array_equal(rotate_image(test_image, 45), gt_rotate_45_transform) or np.array_equal(rotate_image(test_image, 45), gt_rotate_45_transform_updated), 'Rotate 45 image does not match solution'
    Exception raised:
        Traceback (most recent call last):
          File "/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/doctest.py", line 1350, in __run
            exec(compile(example.source, filename, "single",
          File "", line 1, in 
            assert np.array_equal(rotate_image(test_image, 45), gt_rotate_45_transform) or np.array_equal(rotate_image(test_image, 45), gt_rotate_45_transform_updated), 'Rotate 45 image does not match solution'
        AssertionError: Rotate 45 image does not match solution
    Trying:
        assert np.array_equal(rotate_image(test_image, 90), gt_rotate_90_transform) or np.array_equal(rotate_image(test_image, 90), gt_rotate_90_transform_updated), 'Rotate 90 image does not match solution'
    Expecting nothing
    **********************************************************************
    Line 11, in q4d 0
    Failed example:
        assert np.array_equal(rotate_image(test_image, 90), gt_rotate_90_transform) or np.array_equal(rotate_image(test_image, 90), gt_rotate_90_transform_updated), 'Rotate 90 image does not match solution'
    Exception raised:
        Traceback (most recent call last):
          File "/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/doctest.py", line 1350, in __run
            exec(compile(example.source, filename, "single",
          File "", line 1, in 
            assert np.array_equal(rotate_image(test_image, 90), gt_rotate_90_transform) or np.array_equal(rotate_image(test_image, 90), gt_rotate_90_transform_updated), 'Rotate 90 image does not match solution'
        AssertionError: Rotate 90 image does not match solution
    Trying:
        assert np.array_equal(rotate_image(test_image, 200), gt_rotate_200_transform) or np.array_equal(rotate_image(test_image, 200), gt_rotate_200_transform_updated), 'Rotate 200 image does not match solution'
    Expecting nothing
    **********************************************************************
    Line 12, in q4d 0
    Failed example:
        assert np.array_equal(rotate_image(test_image, 200), gt_rotate_200_transform) or np.array_equal(rotate_image(test_image, 200), gt_rotate_200_transform_updated), 'Rotate 200 image does not match solution'
    Exception raised:
        Traceback (most recent call last):
          File "/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/doctest.py", line 1350, in __run
            exec(compile(example.source, filename, "single",
          File "", line 1, in 
            assert np.array_equal(rotate_image(test_image, 200), gt_rotate_200_transform) or np.array_equal(rotate_image(test_image, 200), gt_rotate_200_transform_updated), 'Rotate 200 image does not match solution'
        AssertionError: Rotate 200 image does not match solution
    Trying:
        assert np.array_equal(rotate_image(test_image, 270), gt_rotate_270_transform) or np.array_equal(rotate_image(test_image, 270), gt_rotate_270_transform_updated), 'Rotate 270 image does not match solution'
    Expecting nothing
    **********************************************************************
    Line 13, in q4d 0
    Failed example:
        assert np.array_equal(rotate_image(test_image, 270), gt_rotate_270_transform) or np.array_equal(rotate_image(test_image, 270), gt_rotate_270_transform_updated), 'Rotate 270 image does not match solution'
    Exception raised:
        Traceback (most recent call last):
          File "/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/doctest.py", line 1350, in __run
            exec(compile(example.source, filename, "single",
          File "", line 1, in 
            assert np.array_equal(rotate_image(test_image, 270), gt_rotate_270_transform) or np.array_equal(rotate_image(test_image, 270), gt_rotate_270_transform_updated), 'Rotate 270 image does not match solution'
        AssertionError: Rotate 270 image does not match solution

Notice something? For some rotations, we are left with holes in the image.

Understanding Gaps in Rotated Images¶

When rotating an image, you may notice white spaces (gaps) in the output. These gaps occur due to the way nearest-neighbor interpolation works. Let’s explore this using a simple $3 \times 3$ image.


Original Image Grid

The pixel coordinates are:

$$ \begin{bmatrix} (0,0) & (0,1) & (0,2) \\ (1,0) & (1,1) & (1,2) \\ (2,0) & (2,1) & (2,2) \end{bmatrix} $$

The center of the image is at $(1,1)$.


Rotation by $45^\circ$

  1. Translate the center to the origin
    For pixel $(0,0)$:

    $$ \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} - \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \begin{bmatrix} -1 \\ -1 \end{bmatrix} $$

  2. Apply the rotation matrix
    The rotation matrix for $45^\circ$ is:

    $$ R(45^\circ) = \tfrac{1}{\sqrt{2}} \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix} $$

    Applying the rotation:

    $$ \begin{bmatrix} x' \\ y' \end{bmatrix} = R(45^\circ) \begin{bmatrix} -1 \\ -1 \end{bmatrix} = \tfrac{1}{\sqrt{2}} \begin{bmatrix} (-1) - (-1) \\ (-1) + (-1) \end{bmatrix} = \begin{bmatrix} 0 \\ -\sqrt{2} \end{bmatrix} \approx \begin{bmatrix} 0 \\ -1.4142 \end{bmatrix} $$

  3. Translate back to the original center

    $$ \begin{bmatrix} \text{new}_x \\ \text{new}_y \end{bmatrix} = \begin{bmatrix} 0 \\ -1.4142 \end{bmatrix} + \begin{bmatrix} 1 \\ 1 \end{bmatrix} \approx \begin{bmatrix} 1 \\ -0.4142 \end{bmatrix} $$


Nearest-Neighbor Assignment

To map the rotated pixel back to the grid, we round to the nearest integers:

$$ \text{new row} = \operatorname{round}(-0.4142) = 0, \quad \text{new column} = \operatorname{round}(1) = 1 $$

Thus, pixel $(0,0)$ maps to $(0,1)$ in the rotated image.


Why Do Gaps Appear?

When mapping all pixels:

  • Overlaps: Multiple original pixels may round to the same target coordinates.
  • Gaps: Some target coordinates are never assigned, leaving empty pixels (white spaces).

The rounding step in nearest-neighbor interpolation is the primary cause of these overlaps and gaps in the rotated image.

Problem 4e: Bilinear Interpolation for Image Rotation¶

Task: When rotating an image, gaps (white spaces) can appear due to nearest-neighbor assignment. To avoid these gaps, set each output pixel to a weighted average of the 4 nearest source pixels. This approach is called bilinear interpolation and is common in image processing for producing smoother, gap-free results.

Steps:

  1. For each output pixel:
    • Translate its coordinates so that the rotation center is at the origin.
    • Apply the inverse rotation (i.e., rotate backward by the desired angle).
    • Translate the coordinates back to the original image space to locate the corresponding source position.
  2. If this source position falls outside the image, set the output pixel to 0.
  3. If the source position is inside the image: - Find the four nearest source pixels surrounding this position (top-left, top-right, bottom-left, bottom-right). - Compute the fractional distances from the source position to these neighbors (horizontal and vertical offsets). - Compute a weighted average of the four neighbor values using these offsets (bilinear interpolation).
  4. Assign the computed value to the output pixel. If any neighbor used in the interpolation falls outside the image, treat its value as 0.
  5. Repeat for all pixels.

This method uses inverse mapping (sampling from the original image) rather than forward mapping (mapping source pixels to output), which helps prevent gaps.

In [159]:
def create_bilinear_rotation_matrix(theta, height=28, width=28):
    """
    Create a (height*width, height*width) matrix that applies bilinear interpolation
    for rotating a flattened image by theta degrees.
    Each row of the matrix gives the weights for the input pixels that contribute to each output pixel.

    Args:
        theta (float): Angle of rotation in degrees.
        height (int): Height of the image.
        width (int): Width of the image.

    Returns:
        np.ndarray: A (height*width, height*width) transformation matrix for rotating.
    """
    theta_rad = np.deg2rad(theta)
    N = height * width
    T = np.zeros((N, N))
    center_i = height / 2.0
    center_j = width / 2.0
    
    # Rotation matrix for counterclockwise rotation
    cos_theta = np.cos(theta_rad)
    sin_theta = np.sin(theta_rad)
    
    # For each output pixel (i, j), find which input pixels contribute (bilinear interpolation)
    for i in range(height):
        for j in range(width):
            output_idx = i * width + j
            
            # Translate to center-origin coordinates
            x = j - center_j
            y = i - center_i
            
            # Apply inverse rotation (rotate backwards to find source)
            src_x = x * cos_theta + y * sin_theta
            src_y = -x * sin_theta + y * cos_theta
            
            # Translate back from center-origin
            src_j = src_x + center_j
            src_i = src_y + center_i
            
            # Bilinear interpolation: find the 4 surrounding pixels
            i0 = int(np.floor(src_i))
            i1 = i0 + 1
            j0 = int(np.floor(src_j))
            j1 = j0 + 1
            
            # Fractional parts for interpolation weights
            di = src_i - i0
            dj = src_j - j0
            
            # Get weights for the 4 corners: (i0,j0), (i0,j1), (i1,j0), (i1,j1)
            # Bilinear interpolation weights
            w00 = (1 - di) * (1 - dj)  # weight for (i0, j0)
            w01 = (1 - di) * dj        # weight for (i0, j1)
            w10 = di * (1 - dj)        # weight for (i1, j0)
            w11 = di * dj              # weight for (i1, j1)
            
            # Add weights to transformation matrix for valid pixels
            for ni, nj, weight in [(i0, j0, w00), (i0, j1, w01), (i1, j0, w10), (i1, j1, w11)]:
                if 0 <= ni < height and 0 <= nj < width:
                    input_idx = ni * width + nj
                    T[output_idx, input_idx] += weight
    
    return T


def rotate_image_bilinear(image, theta):
    """
    Rotate an image using bilinear interpolation.

    Args:
        image (np.ndarray): Flattened image array of shape (height*width,) or batch of images (N, height*width).
        theta (float): Angle of rotation in degrees.

    Returns:
        np.ndarray: Rotated image as a flattened array.
    """
    T = create_bilinear_rotation_matrix(theta)
    return apply_transformation(image, T)
    
# rotate with matrix
rotated = rotate_image(test_image, 45)
rotated_interpolated = rotate_image_bilinear(test_image, 45)

all_images = np.stack([test_image, rotated, rotated_interpolated])
plot_labels = ['Original',  'Rotated 45°', 'Rotated 45° (Bilinear)']
show_images(all_images, labels=plot_labels, reshape=True)
In [158]:
grader.check("q4e")
Out[158]:

q4e
passed! 🎉

Problem 4f: Composing Transformations¶

An advantage of transformation matrices is their composability: you can combine multiple transformations into a single matrix. This allows you to apply multiple transformations to an image with the same computational cost as applying just one.

Task:

  1. Compose Multiple Transformations: Implement compose_transforms(*Ts), which takes any number of 784x784 transformation matrices (e.g., shift, rotate, blur) and returns a single matrix that represents applying all transformations in sequence. The transformations should be applied in the order they are provided: the first matrix is applied first, followed by the second, and so on.

  2. Rotate and Blur: Implement rotate_then_blur(image, theta, kernel_size), which rotates an image by theta degrees (without bilinear interpolation) and then applies a blur with a kernel of size kernel_size. Use compose_transforms to combine the transformations and apply them to the image.

  3. Shift, Rotate, and Blur: Implement shift_then_rotate_then_blur(image, dx, dy, theta, kernel_size), which shifts an image by (dx, dy), rotates it by theta degrees (without bilinear interpolation), and then applies a blur with a kernel of size kernel_size. Again, use compose_transforms to combine the transformations and apply them to the image.

In [90]:
def compose_transforms(*Ts):
  """
  Compose linear image transforms (each 784x784).
  Inputs:
    Ts: list of transformation matrices
  Returns:
    T_total: composition of all input transformations
  """
  # If no transforms, return identity
  if len(Ts) == 0:
    return np.eye(784)
  
  # Start with identity matrix
  T_total = np.eye(Ts[0].shape[0])
  
  # Apply transformations in order: T1, then T2, then T3, ...
  # For composition: if we apply T1 then T2, the combined matrix is T2 @ T1
  # (because (T2 @ T1) @ x = T2 @ (T1 @ x))
  # So we multiply from right to left: T_total = T_n @ ... @ T2 @ T1
  for T in Ts:
    T_total = T @ T_total
  
  return T_total

def rotate_then_blur(image, theta, kernel_size):
  """
  Rotate an image by theta degrees (without bilinear interpolation) and then blur it with a kernel of size kernel_size.
  """
  T_rotate = create_rotation_matrix(theta)
  T_blur = create_blur_matrix(kernel_size)
  T_composed = compose_transforms(T_rotate, T_blur)
  return apply_transformation(image, T_composed)

def shift_then_rotate_then_blur(image, dx, dy, theta, kernel_size):
  """
  Shift an image by (dx, dy), then rotate it by theta degrees (without bilinear interpolation), and then blur it with a kernel of size kernel_size.
  """
  T_shift = create_shift_matrix(dx, dy)
  T_rotate = create_rotation_matrix(theta)
  T_blur = create_blur_matrix(kernel_size)
  T_composed = compose_transforms(T_shift, T_rotate, T_blur)
  return apply_transformation(image, T_composed)

rotated_blurred_image = rotate_then_blur(test_image, 45, 3)
shifted_rotated_blurred_image = shift_then_rotate_then_blur(test_image, 1, -4, 200, 5)

all_images = np.stack([test_image, rotated_blurred_image, shifted_rotated_blurred_image])
plot_labels = ['Original', 'Rotated 45° and Blurred 2x2', 'Shifted 5, Rotated 45° and Blurred 2x2']
show_images(all_images, labels=plot_labels, reshape=True)
In [91]:
grader.check("q4f")
Out[91]:

q4f
results:

q4f - 1
result:

    ❌ Test case failed
    Trying:
        assert compose_transforms(create_rotation_matrix(45), create_blur_matrix(2)).shape == (784, 784), 'Compose transforms should return a 784x784 matrix'
    Expecting nothing
    ok
    Trying:
        gt_rotate_then_blur_transform = np.load('public_solutions/rotate_then_blur_transform.npy')
    Expecting nothing
    ok
    Trying:
        gt_shift_then_rotate_then_blur_transform = np.load('public_solutions/shift_then_rotate_then_blur_transform.npy')
    Expecting nothing
    ok
    Trying:
        gt_rotate_then_blur_transform_updated = np.load('public_solutions/rotate_then_blur_transform_updated.npy')
    Expecting nothing
    ok
    Trying:
        gt_shift_then_rotate_then_blur_transform_updated = np.load('public_solutions/shift_then_rotate_then_blur_transform_updated.npy')
    Expecting nothing
    ok
    Trying:
        assert np.array_equal(rotate_then_blur(test_image, 45, 2), gt_rotate_then_blur_transform) or np.array_equal(rotate_then_blur(test_image, 45, 3), gt_rotate_then_blur_transform_updated), 'Rotate then blur image does not match solution'
    Expecting nothing
    **********************************************************************
    Line 6, in q4f 0
    Failed example:
        assert np.array_equal(rotate_then_blur(test_image, 45, 2), gt_rotate_then_blur_transform) or np.array_equal(rotate_then_blur(test_image, 45, 3), gt_rotate_then_blur_transform_updated), 'Rotate then blur image does not match solution'
    Exception raised:
        Traceback (most recent call last):
          File "/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/doctest.py", line 1350, in __run
            exec(compile(example.source, filename, "single",
          File "", line 1, in 
            assert np.array_equal(rotate_then_blur(test_image, 45, 2), gt_rotate_then_blur_transform) or np.array_equal(rotate_then_blur(test_image, 45, 3), gt_rotate_then_blur_transform_updated), 'Rotate then blur image does not match solution'
        AssertionError: Rotate then blur image does not match solution
    Trying:
        assert np.allclose(shift_then_rotate_then_blur(test_image, 1, -4, 200, 3), gt_shift_then_rotate_then_blur_transform, rtol=1e-05, atol=1e-08) or np.allclose(shift_then_rotate_then_blur(test_image, 1, -4, 200, 5), gt_shift_then_rotate_then_blur_transform_updated, rtol=1e-05, atol=1e-08), 'Shift then rotate then blur image does not match solution'
    Expecting nothing
    **********************************************************************
    Line 7, in q4f 0
    Failed example:
        assert np.allclose(shift_then_rotate_then_blur(test_image, 1, -4, 200, 3), gt_shift_then_rotate_then_blur_transform, rtol=1e-05, atol=1e-08) or np.allclose(shift_then_rotate_then_blur(test_image, 1, -4, 200, 5), gt_shift_then_rotate_then_blur_transform_updated, rtol=1e-05, atol=1e-08), 'Shift then rotate then blur image does not match solution'
    Exception raised:
        Traceback (most recent call last):
          File "/Users/leonchen/miniconda3/envs/CS189/lib/python3.10/doctest.py", line 1350, in __run
            exec(compile(example.source, filename, "single",
          File "", line 1, in 
            assert np.allclose(shift_then_rotate_then_blur(test_image, 1, -4, 200, 3), gt_shift_then_rotate_then_blur_transform, rtol=1e-05, atol=1e-08) or np.allclose(shift_then_rotate_then_blur(test_image, 1, -4, 200, 5), gt_shift_then_rotate_then_blur_transform_updated, rtol=1e-05, atol=1e-08), 'Shift then rotate then blur image does not match solution'
        AssertionError: Shift then rotate then blur image does not match solution

Problem 4g: Matrix Multiply Questions¶

  1. Does the order in which you apply transformations matter? Why or why not?
  2. When can a transformation be undone (i.e., when can you multiply your augmented image by another transformation matrix to recover the original image)? What matrix would you multiply by to recover the original image?
  3. Which of the augmentations implemented above can be "undone"? For augmentations that can be undone but may lose information (e.g., parts of the image are cut off), explain the conditions under which this occurs.
  4. Which of these augmentations cannot be "undone" with another matrix multiplication? Why not?

Testing Augmentation on Classifier Performance¶

In this section, we will evaluate how our trained classifier performs on augmented versions of the test images. This will help us understand the robustness of the model to various transformations.

The goal is to analyze the impact of different augmentation techniques on the classifier's performance. Specifically, we will:

  1. Create Augmented Test Images:

    • Use the image augmentation functions (e.g., rotation, flipping, shifting, blurring) to generate transformed versions of the test images.
  2. Evaluate the Classifier:

    • Test the classifier on the augmented images.
    • Measure and compare the accuracy for each augmentation type.
  3. Visualize Results:

    • Plot the performance metrics to identify which augmentations the classifier handles well and which ones degrade performance.

Problem 4h: Augmenting Test Images¶

Task: Create augmented versions of the test images using the image augmentation functions we implemented earlier.

Steps:

  1. Apply each augmentation technique (e.g., horizontal flip, vertical flip, rotation, shifting, blurring) to a sample of 100 test images. This should result in 1300 images (13 augmentations $\times$ 100 test images)
  2. Store the augmented images in a structured format for evaluation.
  3. Ensure that the augmented images are labeled correctly for comparison with the classifier's predictions.
In [87]:
# Test augmentation functions on a few examples
test_images = np.stack(test_df['image'])
test_labels = test_df['label']

shift_inputs = [(5, 0), (-5, 0), (0, 5), (0, -5)]
rotate_inputs = [45, 90, 200]
blur_inputs = [3, 5]
rotate_blur_inputs = [(45, 3), (90, 5)]
shift_rotate_blur_inputs = [((5, 0), 45, 3), ((-5, 0), 90, 5)]

augmented_data = []
# Randomly sample 100 datapoints from test_images
sample_idx = np.random.choice(len(test_images), 100, replace=False)
test_images_sample = test_images[sample_idx]
test_labels_sample = np.array(test_labels)[sample_idx]

# TODO: Apply the augmentation functions we just created (shift, blur, rotate w/ bilinear, rotate then blur, shift then rotate then blur) to every image from test_images_sample
# use the inputs defined above to apply the augmentations
# Save the augmented images in a new DataFrame aug_df

augmented_data = []

# Apply horizontal flip
for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
    augmented_data.append({
        'original_idx': sample_idx[orig_idx],
        'image': horizontal_flip(img),
        'label': label,
        'augmentation': 'horizontal_flip',
        'type': 'flip'
    })

# Apply vertical flip
for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
    augmented_data.append({
        'original_idx': sample_idx[orig_idx],
        'image': vertical_flip(img),
        'label': label,
        'augmentation': 'vertical_flip',
        'type': 'flip'
    })

# Apply shifts
for dx, dy in shift_inputs:
    for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
        augmented_data.append({
            'original_idx': sample_idx[orig_idx],
            'image': shift_image(img, dx, dy),
            'label': label,
            'augmentation': f'shift_{dx}_{dy}',
            'type': 'shift'
        })

# Apply rotations (with bilinear interpolation)
for theta in rotate_inputs:
    for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
        augmented_data.append({
            'original_idx': sample_idx[orig_idx],
            'image': rotate_image_bilinear(img, theta),
            'label': label,
            'augmentation': f'rotate_{theta}',
            'type': 'rotate'
        })

# Apply blur
for kernel_size in blur_inputs:
    for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
        augmented_data.append({
            'original_idx': sample_idx[orig_idx],
            'image': blur_image(img, kernel_size),
            'label': label,
            'augmentation': f'blur_{kernel_size}x{kernel_size}',
            'type': 'blur'
        })

# Apply rotate then blur
for theta, kernel_size in rotate_blur_inputs:
    for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
        augmented_data.append({
            'original_idx': sample_idx[orig_idx],
            'image': rotate_then_blur(img, theta, kernel_size),
            'label': label,
            'augmentation': f'rotate_{theta}_blur_{kernel_size}x{kernel_size}',
            'type': 'rotate_blur'
        })

# Apply shift then rotate then blur
for (dx, dy), theta, kernel_size in shift_rotate_blur_inputs:
    for orig_idx, (img, label) in enumerate(zip(test_images_sample, test_labels_sample)):
        augmented_data.append({
            'original_idx': sample_idx[orig_idx],
            'image': shift_then_rotate_then_blur(img, dx, dy, theta, kernel_size),
            'label': label,
            'augmentation': f'shift_{dx}_{dy}_rotate_{theta}_blur_{kernel_size}x{kernel_size}',
            'type': 'shift_rotate_blur'
        })

# Create DataFrame
aug_df = pd.DataFrame(augmented_data)

# TODO: Select an image and visualize it with all the augmentations applied to it
# Select first image (index 0) and show all its augmentations
# Count unique augmentation types
unique_augs = aug_df['augmentation'].unique()
first_image_idx = 0
first_image_augs = aug_df[aug_df.index.isin([first_image_idx + i * len(test_images_sample) for i in range(len(unique_augs))])].copy()

if len(first_image_augs) > 0:
    aug_images = np.stack(first_image_augs['image'].tolist())
    aug_labels_list = [f"{aug} ({t})" for aug, t in zip(first_image_augs['augmentation'], first_image_augs['type'])]
    fig = show_images(aug_images, max_images=len(aug_images), ncols=5, labels=aug_labels_list, reshape=True)
    fig.show()
In [85]:
grader.check("q4h")
Out[85]:

q4h
passed! 🍀

Problem 4i: Evaluating Classifier Performance on Augmented Data¶

Task: Evaluate the classifier's performance on the augmented test data and compare its accuracy across different types of augmentations. Create a DataFrame named aug_performance with the following columns:

  • augmentation: A string describing the applied augmentation (e.g., "shift_5_0", "rotate_90", "blur_2x2").
  • accuracy: The classifier's accuracy on the augmented data.
  • type: The augmentation type (e.g., blur, rotate, shift, rotate_blur, shift_rotate_blur, none).

Hints:

  1. Check the image column's data type and shape. The model likely expects a 3D array. Use np.stack to combine all augmented images in your DataFrame before scaling and passing them to the model.
  2. Use scikit-learn's StandardScaler to scale the data before evaluation.
In [162]:
# Evaluate classifier performance on augmented data
from sklearn.preprocessing import StandardScaler

# First, compute predictions for all augmented images
aug_results = []

# Group by augmentation type
for aug_name in aug_df['augmentation'].unique():
    aug_subset = aug_df[aug_df['augmentation'] == aug_name]
    aug_type = aug_subset['type'].iloc[0]
    
    # Get images and labels
    aug_images = np.stack(aug_subset['image'].tolist())
    aug_labels = aug_subset['label'].tolist()
    
    # Scale the augmented images (normalize to [0, 1] like training data)
    aug_images_sc = aug_images / 255.0
    
    # Make predictions
    aug_predictions = model.predict(aug_images_sc)
    
    # Store results for each image
    for label, pred in zip(aug_labels, aug_predictions):
        aug_results.append({
            'augmentation': aug_name,
            'type': aug_type,
            'correct': (label == pred)
        })

# Add baseline (no augmentation) performance
baseline_images = np.stack(test_images_sample)
baseline_labels = test_labels_sample
baseline_images_sc = baseline_images / 255.0
baseline_predictions = model.predict(baseline_images_sc)

for label, pred in zip(baseline_labels, baseline_predictions):
    aug_results.append({
        'augmentation': 'none',
        'type': 'none',
        'correct': (label == pred)
    })

# Create DataFrame with individual results
aug_results_df = pd.DataFrame(aug_results)

# Use groupby and agg to compute accuracy for each augmentation
aug_performance = aug_results_df.groupby(['augmentation', 'type']).agg({
    'correct': 'mean'
}).reset_index()
aug_performance.columns = ['augmentation', 'type', 'accuracy']

# Sort by accuracy
aug_performance = aug_performance.sort_values('accuracy', ascending=False)

print(aug_performance)

# Visualize performance: sort by accuracy, color by augmentation type (blur, rotate, shift, none)
fig = px.bar(
    aug_performance,
    x='augmentation',
    y='accuracy',
    color='type',
    title='Classifier Performance on Augmented Data',
    labels={'augmentation': 'Augmentation Type', 'accuracy': 'Accuracy'},
    text_auto='.3f'
)
fig.update_xaxes(tickangle=45)
fig.update_layout(height=600)
fig.show()
                     augmentation               type  accuracy
3                            none               none      0.88
0                        blur_3x3               blur      0.85
1                        blur_5x5               blur      0.77
2                 horizontal_flip               flip      0.58
12                      shift_0_5              shift      0.50
11                     shift_0_-5              shift      0.37
9                      shift_-5_0              shift      0.28
13                      shift_5_0              shift      0.23
15                  vertical_flip               flip      0.23
14   shift_5_0_rotate_45_blur_3x3  shift_rotate_blur      0.15
4                      rotate_200             rotate      0.13
5                       rotate_45             rotate      0.07
6              rotate_45_blur_3x3        rotate_blur      0.05
7                       rotate_90             rotate      0.04
8              rotate_90_blur_5x5        rotate_blur      0.04
10  shift_-5_0_rotate_90_blur_5x5  shift_rotate_blur      0.01
In [163]:
grader.check("q4i")
Out[163]:

q4i
passed! 💯

Problem 4j: Analysis of Augmentation Techniques¶

Among the augmentation techniques, which performed the best and which performed the worst? Why do you think this is the case? Provide reasoning based on the nature of the augmentations and their impact on the model's ability to generalize.

Answer:

Based on the aug_performance DataFrame results:

Best performing augmentation:

  • [Analyze aug_performance DataFrame to identify highest accuracy]
  • Likely candidates: horizontal_flip, vertical_flip, or small shifts
  • Reasoning: These transformations preserve most of the image structure and pixel relationships that the model learned during training. Flips are simple geometric transformations that don't introduce noise or information loss.

Worst performing augmentation:

  • [Analyze aug_performance DataFrame to identify lowest accuracy]
  • Likely candidates: large rotations (200°), blur, or complex compositions
  • Reasoning:
    • Large rotations: Significantly alter the spatial relationships that the model relies on for classification
    • Blur: Reduces image sharpness and detail, making it harder for the model to distinguish fine-grained features
    • Complex compositions: Multiple transformations compound their individual effects, further distorting the image

General observations:

  • Transformations that preserve local pixel neighborhoods (flips, small shifts) tend to perform better
  • Transformations that introduce information loss (blur) or significant geometric distortion (large rotations) degrade performance more
  • The model's performance reflects its sensitivity to the specific features it learned during training

Note: Replace bracketed sections with actual values from your aug_performance DataFrame after running Problem 4i.

You will being doing a LOT of matrix multiplication this semester, so get comfortable with these operations—they are fundamental to many machine learning algorithms you'll encounter!

Before you submit, ensure save_models is true¶

In [82]:
assert save_models and load_saved_models, "save_models and load_saved_models must be True"

assert os.path.exists('classifier.joblib'), "classifier.joblib should exist"

Now that we have gotten familiar with pandas, numpy, and the classic training loop let's look into how we can debug and improve classifiers!

Submission¶

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. Please save before exporting!

In [ ]:
## Use this cell if you are running the notebook in Google Colab to install the necessary dependencies, this may take a few minutes
if IS_COLAB:
    !apt-get install -y texlive texlive-xetex pandoc
In [ ]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True, files=['classifier.joblib'])